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We introduce the concept of quantum tensor product expanders. These generalize the concept 

of quantum expanders, which are quantum maps that are efficient randomizers and use only a 

small number of Kraus operator. Quantum tensor product expanders act on several copies of a 

QQ ■ given system, where the Kraus operators are tensor products of the Kraus operator on a single 

f^ ' system. We begin with the classical case, and show that a classical two-copy expander can be used 

f^ , to produce a quantum expander. We then discuss the quantum case and give applications to the 

^Sl ■ Solovay-Kitaev problem. We give probabilistic constructions in both classical and quantum cases, 

. -J ' giving tight bounds on the expectation value of the largest nontrivial eigenvalue in the quantum 

fl^ ' case. 

Q 

\0 ! I- BACKGROUND: CLASSICAL AND QUANTUM EXPANDERS 

A. Definitions 

^~^. The concept of t-designs[l| provides a way of randomizing quantum states. For example, a 1-design is a set 

of unitaries {f/fc}, where k = 1,.. .,K, such that the average over the set takes any input state to a maximally 
mixed state. A 2-dcsign is a set of unitaries such that applying Uk fJfc to a state on a bipartite system generates the 
twirling operation^ Quantum expanders, as studied in Hamiltonian complexity [3[, computer science[J|, and quantum 

qh| information theory Q , provide a way of approximately realizing a 1-design by repeatedly applying a completely positive 
map built out of a small number of unitaries. In this paper, we introduce the concept of "tensor product expanders", 
which generalize this result and give us a way to approximately realize i-designs. We also discuss the classical case, 
and show that classical tensor product expanders can be used to generate quantum expanders. 

Quantum expanders are a quantum analogue of expander graphsiq. In the quantum case, we consider a completely 
positive, trace preserving map 
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^ ■ A{s) = -^U{s). (2) 



£:(M) = 5^At(,)MA(.), (1) 

where the number of Kraus operators D is relatively small and the map £ has a spectral gap between the largest 
eigenvalue, equal to unity, and the next largest eigenvalue^. We write the spectrum of £ as Ai, A2, ... with Ai = 1 and 
A2, ... all bounded in absolute value by some A < 1. We can equivalently consider the operator £ :— J2s=i A{s)^A{s)* . 
In this paper we consider the case in which the operators ^ (s) are proportional to unitary operators: 



Then the expander map can be implemented by choosing s uniformly at random from {1, . . . , D}, and then applying 
U{s) to the quantum state. The natural generalization of this process, in which we consider k copies of a quantum 
system, choose a unitary at random, and apply the unitary to all k copies, will be called a fc-copy tensor product 
expander. We will show that these give a way to approximate t-designs for t = k. 

Random walks on expander graphs can be viewed similarly, as acting on a distribution with a randomly chosen 
permutation matrix. Consider a directed graph, where each node has D edges leaving it. Label the edges from 1 up 
to D such that each label appears exactly once among the incoming edges of each vertex and exactly once among the 
outgoing edges of each vertex. Then, for each edge label s, 1 < s < D, define a permutation tt^, where 71^(1) = j if 



^ In the non-Hermitian case discussed below, we define the gap instead to be one minus the second-largest singular value of the map E. 



a directed edge with label s goes from node i to node j. Then, given a random walk on the graph, the probability 
distribution p{i) changes in a single step by 
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where P{s) is the permutation matrix corresponding to the permutation tt^; i.e. P{s)ij = 1 if 7Ts{j) = i and 
otherwise. 

Hermitian expanders: It is sometimes convenient to guarantee that an expander we construct is Hermitian. To 
obtain Hermitian £ in the quantum case, we impose 



U{s + D/2) = U{s)K 



Similarly, in the classical case, we impose 



s+D/2 
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This turns the directed graph into an undirected graph. For notational convenience, we identify s + D with s 
throughout this paper, so that s is a periodic variable with period D. Note that this constraint ^ requires that D be 
even. There do exist other ways to construct Hermitian expanders with odd D, if for some s we have U{s) — {/{sy . 

B. Application to state randomization 

For classical expanders, an important implication of the spectral gap is that random walks on an expander graph 
rapidly approach the stationary distribution. Similarly, quantum expanders can be shown to be rapid mixing. This 
has application to the problem of state randomization, in which classical randomness is used to map a quantum 
state to an output that is close in trace distance to the maximally mixed state. Ideally the constructions would 
be [computationally] efficient, meaning they run in time polynomial in the number of qubits, and would use as few 
random bits as possible. 

To make this concrete, suppose that £ is Hermitian and unital with gap 1 — A, and consider a quantum state p. We 
wish to bound the trace norm distance between the maximally mixed state and the state £"^{p) obtained by acting on 
p with some high power, m, of the map £. The calculation exactly follows the classical case. We begin by bounding 
the £2 distance. For a matrix A, define ||A||2 = VtrWA and ||A||i = tr |^| = tr VA'fA. Then 
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as may be shown by writing p as a linear combination of eigenvectors of £, and then by Cauchy-Schwartz, 

S-ip)-§ 



< VA^IAr. 



Thus, to obtain a given bound on the trace norm distance e, it suffices to take 



m > logxie/VN) 



(6) 
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This implies that the set of unitaries, consisting of all unitaries of the form U{si)U{s2) ■ ■ ■U{sm), gives an e- 
approximate 1-design using 



K := 13" 
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unitaries. 

The exponent i log(D)/log(l/A) can be thought of as a measure of the efficiency of an expander, meaning the 
number of bits of randomness it requires to achieve a certain amount of state randomization. Before showing how 
to evaluate ^log(D)/log(l/A), we review other methods of .^1 state randomization. The simplest is to apply one of 
N'^ generalized Pauli operators. This can be done efficiently (i.e. in time polylog(7V)) and perfectly randomizes any 
state (i.e. e = 0). However, it uses far more randomness than necessary when e > 0. Choosing K = 0{Ne~^ log(l/e)) 



random unitaries was shown to suffice in ^Xf|, improving a result of [Tl| (both of which in fact addressed the more 
difficult problem of i^o state randomization). Similarly an efficient K = 4:Ne~^ construction was given in [i2|, which 
uses less randomness than the efficient constructions of [IJl and even than the inefficient constructions based on 
random unitaries. We note in passing that the constructions in |12l. Il3l| are based on expanders with A = e/^/N and 
D = K. 

An expander-based state randomization scheme will be efficient if the underlying expander is efficient and the 
number of unitaries it uses will be given by 0. Unfortunately i log(I?)/log(l/A) is larger than 2 for all known 
efficient constant-degree expander constructions fa, [a, 0| (e.g. for the Margulis expander[6j, it is ~ 8.4, and for the 
zig-z ag p roduct ill it is 2 + o(f )). However, if U{\), . . . , U{D/2) are chosen at random with U{s + D/2) — U{s)'^ then 
Ref. [ill showed that with high probabifity i \og{D)/ log(l/A) kI + 0{\og{N)N-^/^) + 2/ log(D), and thus that K 
is within a small multiplicative factor of N/e^. 

We summarize the above discussion as follows: 

Theorem 1 For any N and any e > 0, consider a set of unitaries C/i, . . . , Uk € l^N, which are taken to be strings 
of unitaries drawn from a set of D/2 unitaries U{1), ..., U{D/2) and their conjugates for any D > A. Then for most 
choices o/C/(l), ..., C/(Z?/2), choose the string length such that 

K = (^) (10) 

and 
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for all N -dimensional density matrices p. 



If we take D k, AN/t^ then Theorem [T] can be thought of as tightening the analysis of random unitaries from 
[lOl . [ill . [l2 |. so that only (4 -|- o(l))iV/e^ random unitaries are necessary. This shows that Haar-uniform unitaries 
require almost exactly the same amount of randomness as the construction of [l2] , although they have the substantial 
disadvantage of requiring poly(A'') time to implement instead of poly(log(Af)) time. Since A > (2V-D — l/D—0{l/N))- 
(1 — 0(loglog(iV)/log(Af))) for any quantum expander that includes its own inverses [i9|, one can show that AN/t^ 
is the minimum possible values of K for any expander-based randomizing map. 

Apart from random unitaries and the large-D constructions of [l^, Il3| ■, we know of one other class of quantum 
expanders for which i log(£')/log(l/A) ~ 1. These are obtained by applying the prescription of Q to the SU{2) 
expanders described by Lubotsky, Phillips and Sarnak in 14]. Such expanders exist for any N whenever D is odd 
and 2D — 1 is prime, and satisfy A = 2^D — 1 exactly. Thus, they provide another K « AN/e^ method of performing 
state randomization. However, the only claimed efficient construction of these expanders[l5j has an incomplete proof. 

In the non-Hermitian case, (O holds when A is the second-largest singular value of an expander. If L'^(l), . . . , U{D) 
are chosen uniformly at random, then 19] proved that with high probability the singular values of £™ for m — 
0{N^^^) are bounded by N"^ {1 / ^/lJ)"'^ {\ + o(l)). This implies that the second-largest eigenvalue of £ is < -^(1 + 

0(log(A^)A^^^/^)), but does not yield meaningful bounds on the second-largest singular value of E. Indeed, Tobias 
Osborne has pointed out that when m = 1 and D — 2, the second largest singular value is equal to unity. If £™ 
turned out to have singular values nearly equal to £)~™/^ then it would imply that « N/e'^ random unitaries sufficed 
to e-randomize a state. 

We now turn to tensor product expanders, considering classical tensor product expanders in Section |TT] and quantum 
tensor product expanders in Section IIIII The mixing analysis above generalizes in the tensor product case to give 
approximate t-designs. We will describe randomized constructions of both classical and quantum tensor product 
expanders. Our basic tool to prove that a random construction gives an expander with high probability is the trace 
method (see, for example [8|, [l8|). The basic idea of the trace method is to bound eigenvalues of some linear operator 
by bounding the trace of high powers of that operator. For example, for a positive definite Hermitian operator whose 
two largest eigenvalues are equal to unity and to A, the trace of the m*^ power is at least equal to 1 -I- A™, so by 
bounding the trace we bound A. We focus on high powers of the operator so that the trace will be dominated by 
the largest eigenvalues. The trace method will be adapted, with slight modifications, to the various cases, depending 
on whether classical or classical and quantum, and depending on whether we consider an expander and or a tensor 
product expander. 



II. CLASSICAL TENSOR PRODUCT EXPANDERS 

In this section we define classical tensor product expanders, and give a random construction of them. We then 
show an application of them to constructing quantum expanders. 

A. Preliminaries, Definitions and Applications 

We define an {N, D, A, k) classical fc-copy tensor product expander to be a set of A^-by-A^ permutation matrices 
P{s), 1 < s < D, with the property that the matrix L, defined by 



Lk = ^J2Pi')'"' (11) 



s=l 



has some number, /^ , eigenvalues equal to unity, with /^ defined below, and then all other eigenvalues less than or 
equal to A in absolute value. (Again, if Lk is non-Hermitian then we consider its singular values.) 

We can obtain Hermitian operators Lk by considering D even, and imposing P{s + D/2) = P(s)^. To obtain 
Hermitian Lk for D odd, we can instead impose P{s) = P{sy; that is, the permutation matrices correspond to 
perfect matchings. Both models corresponds to models of random graphs for fc = 1 discussed in [9|. 

These expanders can also be defined by graphs with N'' nodes, labelled (rii, n2, . . . , Uk), where 1 < rii < N . There 
is an edge from one node (ni, . . . ,nk) to another node (n'j^, . . . , n'j,) if and only if one of the given permutations sends 
ni —f n'l, . . . ,nk — > n'),. We refer to this graph as Gk- Alternatively, we can regard ni, . . . , n^ as k different random 
walkers executing a correlated random walk on the original graph. 

The function f^ is defined to be equal to the number of unit eigenvalues of the operator 
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where the sum ranges over all permutations tt, and P^^ is the permutation matrix corresponding to permutation n. 
Since this operator performs an average over a group action, it is a projector. Applying it to a computational basis 
state |ni, . . . , n^) maps it to the superposition of all \n[, . . . , n'f.) such that n[ — n' iff rii — n.j. Thus we can represent 
eigenstates by partitions of {1, . . . , fc} into < N blocks, such that indices are equal within blocks and unequal across 
blocks. For example, f^ = 1, /^ = 2 (corresponding to the sum of all states with ni = 712 and the sum of all states 
with ni 7^ 712), f^ = 5 (corresponding to the possibilities rii = n2 = n^, ni = n2 y^ n^, ni — n^ ^ n2, n2 = n^ y^ ni, 
and ni y^ n2 ^ n^ y^ Til), and so on. Note that if A^ > A: then the constraint that there be < A^ blocks becomes 
superfluous, and f^ becomes simply the k*"^ Bell number Bk, which counts the total number of ways of partitioning 
a fc-element set. 

Any matrix Lk of the form (|lip is block diagonal with f^ different blocks depending on the symmetry of the 
elements ni,...,nk under permutation; we call these subspaces Si, S2, ■ ■ ■ , StN . By the arguments of the above 
paragraph, we can write the projector in ((T2|) as 
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for some unit vectors \ua) € Sa- These \ua) are unit eigenvalues not only of (jT2|) but also any Lk- 

Rapid mixing: Given the spectral gap, repeatedly applying a classical tensor product expander many times (of order 
A:log(A^)) generates an approximately fc-wise independent permutation. This means that the results of applying it to 
k distinct elements are almost indistinguishable from applying a single permutation to each of the k elements. More 
precisely, given an initial probability distribution, p, in any of the f^ different subspaces ^a, we have 

\\LTp~Uah<v^'\Xr, (13) 

where Ua is the h normalized eigenvector with eigenvalue unity in this subspace. This approach towards generating 
/c-wise independent permutations has also been considered in |16| . 

Expanders are not always tensor product expanders. The requirement that a set of permutations form a tensor 
product expander for fc > 1 copies is more stringent than the requirement for fc = 1 copy, as it implies that the 
correlations between elements are destroyed by the expander. For an example of a classical expander that does 



not give a tensor product expander, consider any set of D permutation matrices, P(s), on N elements that gives a 
classical expander. Define a new set of permutation matrices, P'{s), on 2N elements, such that P'{s) — P{s) ® P{s) 
for s = I, . . . ,D. Finally, define the permutation P'{D + 1) which sends i to i + N if i < N, and sends i to i — N 
ii i > N. Then, these D + I different permutation matrices define a fc = 1 expander (they simply correspond to 
two copies of the original graph, with the possibility of moving between the two copies by using permutation matrix 
P'{D + f )), but does not define a, k — 2 expander: if two walkers, ni, n2 originally are in the same copy as each other, 
then they remain in the same copy. 

Another example comes from Cay ley graphs. If G is a group with generators gi, . . . , go then the Cay ley graph on 
G is defined by taking N — \G\ and P{s)\g) = \gsg) for s — 1, . . . ,D. There are many Cayley graph expanders known 
(c.f. Section 11 of ^8J), but applying P{s) (8) P{s) to any \g) (g) \h) produces a new state \g) (g) \h) with g~^h = g~^h. 
Thus, no Cayley graph expander can be a tensor product expander unless it is modified in some way. 

The limit of large k: Observe that any A:-copy tensor product expander is also a fc'-copy tensor product expander 
for all k' < k. On the other hand, even if A; > A^ then the k walkers can still occupy only at most N positions. Thus 
if a map is an iV-copy tensor product expander than it is also a fc-copy tensor product expander for all k. 

An equivalent condition to {tti, . . . , ttd} C Sn being an A^-tensor product expander is that the Cayley graph 
generated by {tti, . . . , tto} is an expander. The spectrum of this Cayley graph is identical (up to multiplicity) to that 
of Lk for allk>N (with P{s) defined to be P^J. 

B. Random permutations are tensor product expanders 

The question then naturally arises whether A: > 1 tensor product expanders actually exist. Of course there is a 

trivial D = N\ construction where we take {tti, . . . , ttjv} = Sfq and achieve A = for all k. We would prefer, though, 

that D = 0(1). The construction of [lg| nearly achieves this with D = polylog(A^) and A = 1 — l/poly(fc,log A^). 

For a constant degree construction, we can use Kassabov's expander |17| on Sn- This achieves D = 0{1) and A equal 

to a constant strictly smaller than 1 for all A^ and k. Additionally, it can be implemented in time polylog(A^). 

In this section, we give a randomized construction of tensor product expanders for any even D > A and with 
1 

A « A^+^ , where 

A» .- '-^. (M) 

Theorem 2 Choose tti, . . . tT^dji S Sn at random and then take tTs^]j^2 = '^7^- ^'^^ -P(*) = ^t^s- ^'^''' '^'^2/ ^j ^^^ ^ 
denote the f^ + f' largest eigenvalue of Lk- Then for any c > 1, 



Pr 



A > c(xt + 0( ^°'^'^ t'r^!"'^'^^^ ))] ^ c(-^+^)^°^v^.W, (15) 



log(Ar) 

where Pr[...] denotes probability and \h depends on D as given in Eq. (|14p .. 

Note that since A^ converges to unity as k becomes large, the result (fTSJ) is only meaningful for k = 

C'(log(A^)/log(log(Af))). Constants depending on D are also hidden inside of the 0{) notation. The result is likely 
far from optimal, since numerical studies indicate that for fixed k andlarge A^, the largest non-trivial eigenvalue A 
approaches Xh- This result for the case k = 1 was only recently proven[9j. Our prooL which gives a weaker bound on 
the expectation value of A roughly follows the presentation of the trace method in @, [l^i with some modifications. 

Proof of Theorem \^ We will apply the trace method separately in each of the subspaces Sa ■ It suffices to consider 
only one such subspace Sa, the subspace Scn in which all of the ni,n2, ■ ■■ ,nk differ from each other, since every 
eigenvalue of Lk is an eigenvalue of Lk restricted to SfN. For example, consider the case k = 2. We have two different 
subspaces, one with ni = n2 and one with ni ^ n2. The eigenvectors of the first subspace, of the form X]iP(*)N)l*)j 
correspond to eigenvectors of Li of the form '^i pii)\i) ■ Given such an eigenvector, we can construct an eigenvector 
in the second subspace equal to ^^ ^j-iiP{'i)\i)\j) with the same eigenvalue, as claimed. 

Let E[...] denote an average over different choices of permutation matrices. Then for any even to, 

E[|A|] < (E[tr(L^i?)] - 1)1/™, (16) 

where R is the projector onto the given subspace. The expectation value E[tr(L™i?)] equals 
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{^T E E - E ]E[tr(P(.i)P(52)...P(.™)P)]. (17) 



Si = l S2 = l 



If for some i we have s, = Si+i + D/2, then P{si)P{si+i) = 1, and we can remove that pair of permutation matrices 
from the trace above. Similarly, if Sm = Si + £'/2, then we can remove the first and last permutation matrices from 
the trace, exploiting the cyclic invariance of the trace and the vanishing commutator [P{s),R] = 0. We can consider 
these operations as acting on a word si, S2, ..., Sm on an alphabet {1, ..., D}. We define a reduced word by removing 
pairs of letters of the form s,s + D/2. Similarly, if the word ends with a letter s and begins with a letter s + D/2, we 
remove this pair also. We repeat these removals until no further removals are possible. The result is a reduced word 
of length mP < m; the resulting sequence we write s[, S2, ■■■, sj,j0. There are at most 

{D ~ l)"/22'" = D'"X"j} (18) 

choices of si, ...,Sm which give mP = 0; the number of these choices is equal to D"* times the return probability of 
a random walk of length m on a Cayley tree of degree D. For these choices, we have E,[tr {P {si)P{s 2) ■■■P{sm)R)] — 
tr{R) < N''. 

We now consider the other choices of si, ...,Sm, where rrfl > 0. In general, 

E[tr(P(s'i)P(4)...P(4„)i?)] < 7V'=E[tr(P(s;)P(4)...P(s:„o)i?i.2,...,fc)], (19) 

where Ri.2,...,k projects onto the state with ni = l,n2 — 2, ...^nk — k. To compute this expectation value, we define 
Vq = a, for 1 < a < fc. Then, define v"", for « > 1 and 1 < a < fc, to be tTs'.{v°'_i). Then, the probability that 
v"" — a for all a is equal to the desired result. We compute this probability as follows. Consider this as happening 
sequentially, where first we define v^ for all a, then we define V2, and so on. We say that a choice of vf is "free" 
if at no previous step j < i did we compute 7Ts>.{v''_i) with s'- — s[ and Ww_i = ^f-i- If a choice of vf is free, and 
if t values of tt^/ have been previously revealed, than we can simply pick vf at random from the N ~ t possibilities, 
thus revealing some of the information about the permutation tTs" , and increasing t by one for that permutation. If 
a choice is not free, then it is "forced", in which case we have no choice about the value of tTj' (w°_i). 

We say that a coincidence occurs at step i for walker a if this is a free step and the randomly selected vertex 
coincides with a previously selected vertex (previously selected by any of the walkers). Note that for w° to equal a 
for all a, we must have at least k coincidences. There are two cases: either there are at least fc + 1 coincidences, or 
else there are exactly k coincidences. 

The probability of there being at least fc + 1 coincidences can be computed as follows. Let ii,J2, ■■■,'i-k+i be the 
steps of the first fc + 1 coincidences and ai, 02, ..., Uk+i be the corresponding walkers. The probability of having these 
coincidences for given ii, ... and ai, ... is bounded by [mk / {N — mk)) '^^ . Summing over all possible steps and walkers, 
we find that the probability of having at least fc + 1 coincidences is bounded by 

m'^+ifc'=+i(mfc/(7V-mfc))'=+i. (20) 

If there are exactly fc coincidences, then each walker has exactly one coincidence given that vf^ = a for all a. There 
are two possibilities: either all of the coincidences occur on the last step, or at least one coincidence does not occur 
on the last step. The probability of the first case is at most {1/{N — mk))^ . If at least one coincidence does not occur 
on the last step, then let walker h be the first walker to have a coincidence, occurring on step j. Note that each of 
the vertices 1, ...,a must be the randomly selected vertex on exactly one coincidence, again given that vf^^ = a for 
all a. Because there are no further coincidences for walker 6, we have s'j — s'j_|_ ■ for all i. The fraction of reduced 
words of length ttiq that obey this constraint for given j < mo/2 is at most {D — i)-™o/2 ^i^g fraction of words that 
have a reduced word of length ttiq is at most {D — 1)™°''^X^. Therefore, the fraction of words that have a reduced 
word obeying this constraint, after summing over j, is at most mX^. The probability of having these coincidences 
is bounded by {m/{N — mk))'^, where the factor of m arises from the choice of step on which the coincidence occurs 
(this is in fact a large overestimate). The product of these probabilities is mX']}{m/{N — mk))*' . The total of these 
two possibilities is 

(l/(iV - mk)f + {m/{N - mk)fmXl^). (21) 

Adding the sum of the expectation value over words with mP — (which is bounded by N^X^ by Eq. (fT5|) to A^*^ 
times the sum of (|20l2ip . we find that 

E[tr(P(s'i)P(4)...P(s;,o)i?)] <N''X'Ji + N^m''+^k'-'+^{mk/[N-mk)f+^ + [N/{N-mk)f + {Nm/{N^mk)fmXii. 

(22) 
and therefore 

E[tr(P(s;)P(4)...P(4„)P)]-l (23) 

< TV'^AS + N^n?+^k^+^{mk/{N - mfc))'=+^ + [{N/{N ~ mk))^ - 1] + {Nm/{N - mk))''mX'^ 
= N'^X'}} + N^m^+'^k^+^{mk/{N - m,k)f+^ + 0{mk^/N) + {Nm/{N - mk)fmX'ii. 



We pick 

m={k + l)\og,^^jN) (24) 

to minimize this expectation value, finding 

(E[tr(PK)P(4)...P(4,,)^)] - 1)'/™ < \T^'\Oimk)Y''+'^/"^. (25) 

Applying Markov's inequality then yields the proof of the Theorem. | 

C. Quantum expanders from classical tensor product expanders 

One application of fc = 2 classical tensor product expanders is to constructing quantum expanders. We give two 
constructions. 

The first approach was introduced, but not formally analyzed, in [3[. Let P{s) be a set of random permutation 
matrices defining a A; = 2 tensor product expander, as in the random construction of a A; = 2 tensor product expander 
above. Then, define (j{s), for s = 1...D, to be a diagonal matrix. For s = 1, ...,D/2 we choose a{s) t_o have diagonal 
entries ±1 chosen independently at random and we choose a{s + D/2) = P(s)(t(s)P(s)^. Then, in [2| it was shown 
numerically that the A matrices, 

A{s) = -l=P{s)a{s), (26) 



define a quantum expander with high probability. Note that the choice of a{s + D/2) is such that A{s + D/2) = 
A{s)^ — {l/\/D)a{s)P{s)'^ so that this is a Hermitian expander because P{s) — P{s + D/2)^ . Numerically, A was 
observed to approach Xh for large N. We now prove that we do indeed get a quantum expander with high probability, 
but with a weaker bound on Xh- 

Theorem 3 Choose tti, . . . ,tt£,/2 G Sn at random and then take iTs+d/2 = ''^7^- -^^i P{s) — P^^. Choose a{s) as 
described above. Let X denote the second largest eigenvalue of the map with Kraus operators given by the matrices 
A{s) in Eq. \20j) . Then, for any c> 1, 
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A>c(Af . -l°g(l°g(^))' 



" \os(N) ' J\ ~ ^ ' 



The Hermitian, completely positive map £ defined by the A matrices in (j26p sends a diagonal matrix to a diagonal 
matrix and an off-diagonal matrix to an off-diagonal matrix. So, we consider the spectrum of £ in the diagonal and 
off-diagonal sectors separately. In the diagonal sector, the spectrum of £ is the same as that of the k = \ expander 
defined by the given permutation matrices, and hence has a gap between the largest eigenvalue, equal to unity, and 
the next largest eigenvalue. 

The off-diagonal sector requires a little more work. We again use the trace method. Let A be the largest eigenvalue 
in absolute value in the off-diagonal sector. Let AI{i,j) be an N-hy-N dimensional matrix with a one in the i*"^ row 
and j'^ column, and zeroes everywhere else, so that these form a basis for the space of N-hy-N matrices. The M{i,j) 
with i 7^ j form a basis for the space of off-diagonal matrices. Define (Af , N) to be an inner product on the space of 
N'^-hy-N'^ dimensional matrices by {M, N) = ti-{AI^N). Then for any even m, 

n\X\] < (E[^(M(z,j),f™(Af(z,j)))])'^". (28) 

Note that compared to Eq. pB]) , a factor of unity is not subtracted from the expectation value on the right-hand side 
of Eq. dig). 

The evaluation of the right-hand side of Eq. ((28)) proceeds analogously to that of Eq. (|16p . The computation in the 
case 771° = is identical. In the case m'^ > 0, we again define coincidences and paths. The only difference is that now 
rather than just computing the probability that v'^o = a for all a = 1,2, the paths come in with signs which may be 
plus or minus one. This can only reduce the contribution of the terms with m'^ > 0. We bound the case with fc -I- 1 
coincidences as before. We also bound the case with k coincidences not all occurring on the last step as before. The 
only difference is the case in which all coincidences happen on the last step i — m° . The probability of this happening 



is (1/7V)^. The sign, however, is completely random; it is equally likely to be plus or minus one. Thus, the paths with 
exactly k coincidences, all occurring on step i — m", contribute zero to the expectation value (|28p . Thus, 

E[|Ar] < {N'' + m)AS + N''m''+^k''+\mk/{N ~ mk)f+^. (29) 

1 /s 
Picking m as before, we find that E[|A|] < \^ (f + C'(log(log(A^))/log(A^)). Applying Markov's inequality yields the 

theorem. | 

We now describe our second construction of a quantum expander from a classical tensor product expander. 

Theorem 4 Suppose {P{1), ..., P{D)} form a {N, D,l~ e, 2) classical tensor product expander (i.e. k — 2). Assume 
that N >2. Let 

N 

Z = Y,\j}{j\e'^ 



and p — 1/(1 + e). Define a quantum, operation £{M) with D + \ Kraus operators ■v/^P(l), . . . , y/^P{D), ^1 — pZ . 
Then £ is a {N, Z? + 1, 1 — j^) quantum expander. 

Thus, any constant-gap classical 2-TPE can be used to construct a constant-gap quantum expander. No attempt 
has been made to optimize the constant 48, which we believe can be made arbitrarily close to one when N is large 
and e is close to 1. 

Note that ^,/^P{l), . . . , ^y^P{D),y/l — pZ is not in general Hermitian, but if {-P(l), . . . , PiD)} is Hermitian then 

{^/%P{l), . . . , y^P{D), J^Z, J^Z^ is a Hermitian {N,D + 2,1- e/48) expander; this is proved by using the 
triangle inequality to relate its gap to the gap of the expander in Theorem |4l 

Proof of Theorem ^ The idea is that the classical TPE randomizes the diagonal elements of the density matrix 
simply because it is an expander, and it randomizes the off-diagonal elements because it is a fc = 2 TPE. Next the 
phase operation Z adds a phase to the off-diagonal elements so that they are no longer fixed by the classical TPE. 
Thus the only fixed state will be the identity matrix. 

More formally, let \ipi) — -7^J2i=i N)N) ^^'^ \f2) = , ^ Si?^? N)b)- These two states form an orthonormal 

basis for the invariant subspace of -^ X]s=i ^(•*) ® ^i^)- Thus the fact that P(l), . . . , P{D) form a 2-TPE implies the 
bound 



1 ° 



s=l 



< A. 



Next, a short calculation shows that {(p2\Z\(p2) — —^/{N — 1). Now apply the following Lemma to the subspace 
orthogonal to \(pi). 

Lemma 1 Let H be a projector and let X and Y be operators such that \\X\\ < 1, \\Y\\ < 1, HX — XH — H, 

||(/-n)X(/-n)|| < l~ex and||nrn|| < l-ey AssumeO < ex^ty < 1. Then for any <p<l, \\pX + {l-p)Y\\ < 
1. Specifically, 

\\pX + (f - p)Y\\ < 1 - ^ minipex, 1 - p). (30) 

Setting p = 1/(1 + ex), we obtain 

\\pX + (I - p)Y\\ < I '^^^ , <i-!£fl. (31) 

"^ ^ ^' " - 12(1 + ex) ~ 24 ^ ' 

The Lemma is proved in Appendix 1X1 We apply the Lemma by taking X — -^ X]s=i P{^)^P{^)~'Pii ^ — Z®Z* — Lp\ 
and n — ip2. Then plugging ex — e and ey = 1 — 1/{N — 1) > 1/2 into pT| completes the proof of Theorem|4l | 

III. QUANTUM TENSOR PRODUCT EXPANDERS 

In this section we define quantum tensor product expanders and show that random unitaries provide a way of 
constructing tensor product expanders. We begin with some preliminaries and definitions, present applications to 
the Solovay-Kitaev problem of approximating unitaries by a string of elementary operations, and finally prove that 
random unitaries give tensor product expanders. The proof of this last statement begins in subsection llll Cl it closely 
follows \19\ and should be read in conjunction with that paper. 



A. Preliminaries, Definitions, and Applications 

Suppose we have a collection of unitaries {[/(I), . . . , U{D)} G Un- Define a quantum operation £k that applies 
for s g {1, . . . , D} chosen uniformly at random. In other words 



1 ^ 
£k{M) ^^Y. U{srHl{U{s)^r\ (32) 

s=l 

where M is an N'^ x N^ matrix. Since an N^ x N'^ matrix can also be viewed as an A^^'^-dimensional vector, we can 
also interpret Sk as a linear operator on an A^^'^-dimensional vector space. Define this operator to be 

^^■■=^ilu{sr^®{u{srfK (33) 

Note that £k and ft are isospectral. 

In previous workjj, [^, [l^ £i was said to be a {N, D, A) quantum expander if the second-largest eigenvalue of 8-2 
was < A. In fact, the definition of quantum expanders included even quantum operations that were not mixtures of 
unitaries, as long as they could be expressed using < D Kraus operators. Here we will change notation from 0, [^, [I^l 
slightly. We say that a set of unitaries {U{\), . . . , C/(D)} is a (A^, Z?, A, k) tensor product expander if the operator 
£k has F^ (defined below) eigenvalues equal to one, and all of its other eigenvalues have absolute value < A. This 
differs from the notation of 4, 5, 19] in that the set of unitaries, rather than the quantum operation, constitutes the 
quantum expander^. When TV and D are understood, we sometimes simply say that {[/(I), . . . , [/(_D)} are a fc-tensor 
product expander with gap 1 — A. 

We define F^ to be the rank of the projector 

% := [ [/»'=(»([/* )^'=dC/ 
JueUN 

or equivalently of the operation 7^, which is defined by 

Tk{M) = / U'^'^MiU^f'. (34) 

JueUN 

(Throughout the paper the integration measure dV^ will be the Haar measure.) This map is the "twirling" operationj^]. 
Since 7/j is a Hermitian map and 7fc(7fe(M)) = Tk{M), the map Tk{M) has all eigenvalues equal to zero or unity. 
For TT € 5fe, we define the N'^ x N'^ matrix PAr(7r) is defined to be 



N N 



(tt) = ^ ••• ^ |il,...,ZAr)(i^(l),...,i^(Ar)|. 
n=i ifc = i 



Since PAr(7r) commutes with any matrix of the form C/** , it follows that 7i:(PAr(7r)) — £fc(P7v(7r)) — PAr(7r) for any 
TT. We claim that the Fn{tt) (and their linear combinations) constitute all of the unit eigenvalues of £k- This fact 
follows from Schur-Weyl duality, and specifically Thm 3.3.8 of [24] which states that Tk{M) — M if and only if M is 
a linear combination of PAr(7r) operators. Thus F^ — dimSpan{PAr(7r) : tt e Sk\- 

An important special case is when N > k. In this case, the set {PAr(7r)]l, 2, . . . , fc) : n £ Sk} is linearly independent, 
which implies that {PAr(7r) : tt e Sk} is linearly independent and thus that F^ — fc!. 

In the quantum case, tensor product expanders give us a way to approximate the twirling operator 7^ of 2]. This 
is because 

ll4"-rfel|^<A", (35) 

so whenever A < 1, £^ = Tj,. Let us consider various other possibilities for implementing twirling as a sum of different 
unitary transformations: one approach to exactly implementing the twirling operation is to use t-designsjj, but the 



^ One can slightly generalize this by defining a set of unitaries and a set of associated probabilities to be a tensor product expander; 
however in this paper we consider applying each unitary with equal probability summing to unity. 
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number of unitaries that must be implemented in this case grows with N. Another approach was discussed in [20|, 
which avoids having the number of unitaries grow in N, but requires the abihty to implement a number of unitaries 
growing linearly in the logarithm of the error of the approximation. In contrast, tensor product expander require only 
the ability to implement a constant number of unitaries to get arbitrarily good approximations. This is a definite 
advantage; however, in practice, our construction of tensor product expanders here, which relies on the ability to 
construct random unitary operations, probably cannot be efficiently implemented using gates; instead, we would like 
to efficiently implement a deterministically constructed tensor product expander. This raises the interesting question 
of whether the constructions of [5] can lead to tensor product expanders also. 

The limit of large k: The situation when k is large has some similarities to the classical case. It still holds that any 
(A^, D, A, k) quantum tensor product expander is also a {N, D, A, k') quantum tensor product expander for all k' < k. 
In particular, if a set of unitaries forms a {N, D, A, oo) quantum tensor product expander than it is also a (iV, D, A, k) 
quantum tensor product expander for any finite k. This is equivalent to generating a Cay ley graph expander on Un- 
One difference between the quantum and classical cases is that there is no upper bound to the size of irreps of Z^at, 
like there is for Sn- 

Note that constant degree Cayley graph expanders are known for U2] indeed, choosing the matrices at random will 
yield an expander with probability one [261] . However, no proof of this fact is known for N > 2. 

B. Solovay-Kitaev gate approximation 

One application of tensor product expanders is to the problem of approximating an arbitrary V £ Un with a 
string of gates from a fixed universal set {U{1), . . . , U{D)}. The fact that {f/(l), . . . , U{D)} is universal means that 
({/(I), . . . , U{D)) is dense in Un (optionally neglecting an overall phase). This means that for any V G Km and any 
e > 0, there exists a string si, . . . , s^ such that U{si)U{s2) ■ ■ ■ U{sm) is within a distance e of V. Often we also want 
to know (a) how quickly m grows with 1/e and (b) how long it takes to find si, . . . , Sm- When {C/(l), . . . , U{D)} 
contain their own inverses, the Solovay-Kitaev theorem[2l| gives a poly log(l/e) time (for fixed N) algorithm to find an 
e-approximation with m = 0(log ^°' ^(1/e)). Very little is known in the case without access to inverses, except that 
C/(s)^ can be simulated to error e using 0(l/e^") apphcations of U[s), meaning that the Solovay-Kitaev construction 
can be used with this amount of overhead. 

Turning to lower bounds, observe a ball of radius e in Um has volume 8(e^ ). This implies that to approximate all 
strings to within error e requires ri((l/e)^ ) different unitaries, or equivalently a Vt{N'^ log 1/e) string length. A long- 
standing open question is whether the Solovay-Kitaev approximation can in general be improved to use the optimal 
0(log 1/e) number of gates. Such optimally short approximations are known to exist whenever a particular random 
walk on Um has a gap[23| : specifically, the walk consisting of multiplyingby U{s) for s randomly chosen from 1, . . . , Z?. 
For U2^ it was recently proven that generic U{\)^ . . . , U{D) are gapped[23| and thus yield short approximating strings. 
However, the situation for Un for N > 2 remains open. 

In this section we will prove that when k is sufficiently large, unitaries forming fc-tensor product expanders yield 
optimal 0{N^ log l/e)-length e-approximations for any gate in U^- 

Theorem 5 . Suppose {U{1), . . . ,U{D)} form a k-tensor product expander with gap 1 — A for k 3> 2S_L/£i, 

Then for any V G Un there exists a string si,...,Sm G {^^---iD} with m — ©(A^^ log j^/^ (1/e)) and 
d{V,U{si)U{s2) ■ ■ -Uis^)) < e. 

Here we define the distance between two unitaries d{U, V) by 



d{U,V)= min \\U - e^'f'Vh ^ 2N - 2\iiU^V\, 

*e[0.27r] 



so that it ignores overall phase 



The main result from 22| can be thought of a weaker version of Theorem [5j it requires /c = cxd to achieve the same 
conclusion. Unfortunately, Theorem [S] only shows that generic sets of unitaries are fc-tensor product expanders for 
k ^ N^/^/ log(A^). Thus, at present the existence of expanders satisfying the assumptions of Theorem[5]is a nontrivial 
conjecture. It is possible that there exists some strengthening of the results of Theorem [S] which will allow us to show 
that generic unitaries fulfill the assumptions of Theorem [5l 

Proof of Theorem\^ Let |$) — — ^ X]i=i N)N) be the maximally entangled state on C^ ® C^. Define p{U) = 

[([/ ® I)<^{W (E) I)] ®^ Observe that 



t.p{U)p{V) = \trU^Vr/N^'' = (l^^p-) 



2k 



(36) 
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Let -Be/3 be the ball of radius e/3 around the identity: i?e/3 — {U\d{U, I) < e/3}. Let Vol(e/3) denote the volume 
ofBe/3 = 0((e/3)^'). Define 

Similarly we define 



PH = / piV)dV. (38) 

JveUn 

These states are normalized so that tr p^{U) = ti pn — 1- Since p{V) > for all V, we have the operator inequality 
PtiU) < ph/ Vol(e/3) for any U. Also observe that pn = i%® id^ ) (p{U)) for any [/, where idAr denotes the identity 
operation on N x N density matrices. 

We will find it convenient to think of density matrices as vectors with the Hilbert-Schmidt inner product {A, B) ~ 
tr A^ B. In this picture 7^ is a projector, and so 

iYp,{U)pH = irpe{U){% ® id%'')ip,{U)) = trp%. 

To bound trjo|^, observe that the support of pn lies within Span{|f/')®'' : |^) G C^ }, which (according to [2J, [2a |l 
has dimension {^^^2~^) = k{k + 1) ■ ■ ■ {k + N^ - l)/N^l < k^\ Thus trp|^ > k-^\ 
Now we use the fact that ||f ™ — '?fc||oo < A™ together with Cauchy-Schwartz to bound 

trp,(/)£:"(p,([/)) > irp,{I)pH ~ \^trp,{lf > tr pj, (^1 - ^J^^^^) > ^^r pi > ^^, (39) 

where in the second-to-last step we have assumed m > log(2/ Vol(e/3)^)/log(l/A) = 0{N'^ \ogi/;^{l/e)). 
On the other hand, if there is no string si, ..., Sm such that d(U {si)U {S2) ■ --U (sm) , U) < e, then 

trp,(/)r»(p,(C/)) < (1 - -^)'' < e-#. (40) 

If fc/logfc 3> N^ /e then ([39]) and (|40)) cannot simultaneously hold. Therefore there must exist at least one string 
si, . . . , s„ for which (i(f/(si)[/(s2). ..[/(«„), U) < e. | 

C. Trace Method and Schwinger-Dyson Equations 

The next three sections are devoted to the expansion properties of randomly chosen unitaries. Recall that we would 
like to construct a quantum tensor product expander by randomly choosing C/(l), . . . , U{D) e Un- There are two 
cases. In the non-Hermitian case, the unitary matrices C/(s) are chosen independently with the Haar measure. In 
the Hermitian case, D is even and the unitary matrices U{s) for s = 1, . . . , D/2 are chosen independently with the 
Haar measure and U{s + D/2) = (/{sY , so that £k is a Hermitian operator. We focus on the Hermitian case, and the 
techniques can be readily extended to cover the non-Hermitian case. Our main result is that for random U{s), with 
high probability we do indeed get a tensor product expander: 

Theorem 6 . Let {[/(I), . . . , U{D/2)} be chosen randomly with the Haar measure from the unitary group Un, and 
let U{s + D/2) = U{s)'<. Let k < 0{N'^'^ / \og{N)) and let A denote the F^ + P* eigenvalue of Ek as defined in Wj) . 
Then, for any c > 1, 

Pr [a > c(l + 0{k \og{N)N-^/^)\H\ < c-(i/4'=)^''" , (41) 

where Xh depends on D and is given in Eq. p^. 

We use a trace method to bound the eigenvalues of £k (M) . We have 

J2 E {M{H,n)^M{i2,J2)^-^M{ik,jk),Sr{M{ii,ji)^M{i2,j2)^...^M{ik,jk))) = Y. i^-i" ^ k\+\xr, 

il,i2,---,ik jl,J2,---,jk a.= l 

(42) 
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where we pick m to be an even integer. We will derive bounds on the expectation value of the trace to bound the 
expectation of lAj™. Eq. ^^ can be re- written as 

^ D D D 

k\ + lAI" < (;^)™ E E - E tr(C/(s™ + D/2)...U{S2 + D/2)U{si + D/2))hr{U{si)U{s2)...U{s^^))% (43) 

Si — lS2— 1 S,n— 1 

Let E[...] denote the average over the unitary group. Averaging Eq. (|43|) wc find 

D D D 

^i.'»^(;^)'"E E- E i^o,fe(si,...,s™)>fc! + E[|An, (44) 



So,fc(si,...,s™) = n^v{U\s„,)...U\s2)UHsi)ftr[U{s^)U{s2)...U{s„,)f] (45) 

= E[tr(f/(s™ + D/2)...U{S2 + D/2)U{si + D /2)fir{U{si)U{s2)...U{s„^)f]. 

As in [i3| , we write the average in Eq. (|^5)) as an average of the form 

E[LiL2...ic], (46) 

where 

Li=tr(C/(si,i)f/(si,2)...f/(si,™J), L2=tr(t/(s2,i)C/(s2,2)...C/(s2,™J), ... (47) 

Here we have an average of c traces, each of which is a product of some number of unitary matrices. In particular, 
Eq. ([451) has c = 2fc, with Li = L2 = ... = i/c = ^l+i = ... = ^L- 
The Schwinger-Dyson equations for a product of this form are[19l|: 

E[tr([/(si,i)C/(si,2)...C/(si,™J)L2...Lj (48) 

^ -]^ E ^..,.,^i,.IE[tr((7(si,i)...[/(si,,_i))tr((7(si,,)...C/(si,„J)L2...ic] 

1 ™\ 
J=2 



-^EE'^^i.-^..'^[ti"(^(^i'i)---^(^i'"i)'^(*''^)^(^'.^>i)---^(^'..'-i))^2...ii-iL/+^ 

i=2 j = l 
T7EE'^«i.i.^'.J+-D/2^^''(^(*1^2)...C/(si.mi)C/(s/j + l)C^(si,j+2)...C/(s;j-l))L2...i;-li/+l...ic 



N 

1=2 j=i 

Note that in the above equation an expression like ?7(s/ j+i)C/(s; j+2)...C^(s; j-i) means 
C/(sz,j+i)C/(si,,+2)...C/(s,,™jC/(si,i)(7(s/,2)...C/(si,,_i). 

Our general algorithm for reducing traces starts by canceling all pairs of matrices U{s)U{s + D/2) appearing 
successively in the same trace, and replacing tr(l) by N. We then apply Eq. (|^5)) . repeating the cancellation of 
successive U{s)U{s + D/2) and replacement of tr(l) by N on each iteration. A term terminates at a given level n if 
there are no matrices left after n iterations. 

Let TO? be the length of the trace after canceling successive U{s)U[s+D/2) before any iterations; on every successive 
iteration, the length of the first trace, toi, is bounded by m\. As in [l9|, the number of different choices of si, ...TSm 
which give rise to a given 771° is bounded by 

(i:i „ l)™;/2(£, _ i)W22™. (49) 

This number is equal to Z)™ times the probability that a random walker on a Cayley tree arrives at a distance m\ 
from the starting point after a walk of m steps. This number is independent of the particular values of si.i, ..., S2,,„o. 

There are [D/{D — 1)]{D — 1)™! different possible values of si^i, ..., s^ ^o and therefore the total number of choices of 
si, ..., Sm which give rise to a given choice of si^i, ..., Si„^o is bounded by 

n — ^ / 1 \ "I? 

-^(^===) {D~ir/^T\ (50) 
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The number of terms terminating at the n'^ level is bounded by 

(2fcm-l)". (51) 

To see this, note that at each iteration of the Schwinger-Dyson equation, the number of terms on the right-hand side 
is bounded by the number of matrices on the left-hand side minus one. Initially, there are 2km matrices, and this 
number does not increase under Eq. (j48p . 

We can estimate the value of a term which terminates at a given level n > 1 as follows. First, there is a sign equal 
to plus or minus 1. Next, there is a factor of (1/iV)". Finally, there is a factor of N for each trace of the form tr(l) 
that appeared in this process. Suppose there are p such traces, giving a factor of N^ . How big can p be? Initially we 
have c = 2k different traces. The given term at level n arose from a specific choice of terms on the right-hand side of 
Eq. (|48)) on the first iteration. This specific choice has fci different traces in it, with fci equal to either fc — 1 or fc -I- 1. 
After the second iteration there are ^2 traces, then ^3, and so on. The number of traces fc2, /cs, ... can be determined 
as follows: an application of Eq. ()48|) may increase the number of traces by one if the term arises from the first or 
second line on the right-hand side, or may decrease the number of traces by one if the term arises from the third or 
fourth line on the right-hand side of Eq. (|48l). Next, some of the traces may be trivial, being equal to tr(l). In the 
event that the term arose from the first, second, or third line of Eq. (|48|) it is not possible for any of the traces to be 
trivial, under the assumption that any repetitions of the form U{s)U{s + D/2) have been previously replaced by 1 in 
the trace on the left-hand side of the equation. However, in the event that the term arose from the fourth line, then 
it is possible for one of the traces to be trivial, increasing p by one. Thus, for each h < n, kjj — fc(,-i is equal to either 
-|-1, — 1, or —2. Let q be equal to the number of times the first or second line was used from Eq. (I48p and n — q equal 
the number of times the third or fourth line was used. Then, in order for all traces to be trivial in this particular 
term resulting from n iterations of Eq. (|48| , 



2k + q-{n~q)~p = 0. (52) 

Also, since p can only increase when a term from the fourth line is used, 

p < n ~ q. (53) 

Thus, 

p<l{2k + n)/3\. (54) 

Therefore, the value of a term terminating at the n^^ level, n > 0, is bounded in absolute value by 

^L(2fe+n)/3j-n^ ^55^ 

Note that if tti^ > then there are no terms terminating at level n with n < fc, so for rn^ = 0, the trace is equal to 
N'^^, while for mJl > 0, the terms are bound in absolute value by N'^ (this bound is only reached if fc = n). 

Eq. (P5|) generates an infinite series, whose n}^ term is the sum of all terms terminating at level n. As in p^ . 
this series is absolutely convergent for 2km < N. In fact, the following stronger claim holds: Eq. (|48|) generates an 
absolutely convergent series for 2km — 1 < N which converges to the expectation value of the trace. To see this, note 
that the value p above, the number of traces of 1, is always bounded by 2km. Thus, the value of a term terminating 
at the n*'' level is bounded by 

^2fcm^-n_ (56) 

Depending on n, sometimes (|55l) gives a better bound and sometimes (1551) gives a better bound, but to estimate 
convergence we will use (j56l) . Eq. (j5ip shows that the number of terms terminating at level n is bounded by (2km— 1)". 
Thus, the absolute value of the sum of terms terminating at level n is bounded by N'^''™ {{2km — 1)/A^)", and so for 
2km — 1 < N, the series is absolutely convergent. Further, a term which has not terminated at the n'^ level contains 
at most 2km, traces in it, and hence is bounded in absolute value by N'^''"^ {1 / N)"^ . Therefore, the sum of all terms 
which have not terminated at the n^^ level is also bounded by N'^''™{{2km — 1)/A^)"), and hence for 2km — 1 < N 
the series converges to the average of the trace. 

D. Example 

We now work out a simple example to give some idea of the use of the Schwinger-Dyson equations. This example 
will also be used later in the idea of "complete rung cancellation" and gives intuition behind the claim that for N > k 
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we have k\ eigenvalues equal to unity. Let the matrix X be chosen from the unitary group with the Haar measure 
and evaluate the expectation value for N > k 



E[(tT{X)tr{X^)] 



For fc = 1, a single application of Eq 

2' 



shows that this is equal to unity. For k — 2, we find 



E 



('tr(X)tr(X+) 



2E 



(tr{X)triX^)\ -(l/iV)E tr(XX) ('tr(X^) 



(57) 



(58) 



= 2+{l/NfE 
= 2+{l/NfE 



tr(X)tr(X^) 
(tr(X)tr(X^) 



-2{l/NfE[tr{X)tr{X^)] 
-2{1/Nf. 



For A^ > 2, this shows that E[('tr(X)tr(Xl')') ] = 2. 

It is interesting to see what happens to the expectation value in Eq. ((58)) for N — l,k — 2. Then, the last line 

Eq. (j58|l gives simply E[( tr(X)tr(X^^) j ] — E[( tr(X)tr(X^^) j ], giving no information about the trace. For general 

N, the sum of terms terminating at level 1 is equal to zero, while the sum of terms terminating at levels 2, 3, 4, 5, 6... 
is equal to 2, —2/N, 2/N, — 2/iV^, 2/7V^, ... respectively. Thus, we do not have a convergent series ioi N — l,k — 2. 

Up to now we have considered the series whose n"' term is the sum of terms terminating at a given level n. We 
now consider instead the expectation value of Eq. ([57)) as a series in 1/A^. For N > k, this series is again absolutely 
convergent to the desired expectation value. It is easy to see that for arbitrary fc, and for iV ^ fc, the expectation 
value ([57)) is equal to fc! + 0{l/N), as there are fc! terms which terminate at level k. We now show that for N > k, 
the expectation value ([57)) is equal to fc! exactly. Note that the expectation value in Eq. ([57)) is equal to the trace of 
the map 7^ (defined in ([M)) ) 

Thus, the trace of the map Tk{M) is equal to the number of unit eigenvalues of 7fc(A/). For N > k the trace of this 
map can then be written as the sum of an infinite series in 1/A^, and using the fact that the number of unit eigenvalues 
is equal to an integer for all integer N, we find that all terms in the series in 1/A^, beyond the term of order iV°, 
must vanish exactly (the calculation above represents an explicit check of this for fc = 2 and it may be readily verified 
for any k). Thus, for all N > k, the expectation value of Eq. ([57)) is equal to fc!. This gives an alternate proof that 
F^ = fc! when N > k. 



E. Counting and Main Result 

In this section we prove a bound on the expectation value of the sum in Eq. ((44)) , which will give us a bound on 
the expectation value of the ?Ti*'' power of A, proving the theorem. The next three paragraph are devoted to outlining 
the basic idea of the proof, before beginning the technical details. 

The basic idea of the proof is to prove the bound on the sum by proving a bound on the number of different choices 
of Si, ...,Sm such that, when the resulting trace is evaluated using the Schwinger-Dyson equations, there is a term 
which terminates at level n, for any given n. We give this bound on the number of choices of Si, ..., Sm in Eq. ()6ip . 
We then combine this bound with a bound on the contribution to the trace of terms which terminate at level n. The 
idea is that there are a only small number of choices of Si, ..., s™ which produce terms which terminates at a small 
level n, and while there are a large number of choices of si, ..., Sm which produce terms which terminate at high levels, 
such terms are small. 

One technical caveat in this work is that for any choice of si, ..., s„j there will be certain terms which terminate 
at a low level n. These are terms in which we use the Schwinger-Dyson equations to contract U{si) in one trace 
with U{siy in a different trace. If for some i, we contract all unitaries U{si) in this way, we have what is called a 
"complete rung cancellation" below. We consider such terms separately, and they are responsible for producing the 
leading order expectation value of the trace in 1/A^: these terms sum to give a contribution fc! to the expectation 
value of the trace, precisely corresponding to the expectation we expect from the unit eigenvalues. 

Ignoring those terms with complete rung cancellations, we see that a term in the Schwinger-Dyson equations must 
involve contracting U{si) with U{sj) or {/{sj)* for some i ^ j. Such terms involve constraints: such a term would 
require that either Si — Sj + D/2 or Si — Sj. In order for such a term to terminate at a low level, there must be many 
such constraints, and this is why there are only a few choices of si, ...,Sm which produce terms which terminate at 
low levels. To show precisely that there are only a few such choices of si, ...,Sm, we follow a different strategy. To 
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explain this strategy, suppose you knew a choice of si, ..., Sm which gave rise to a term which terminated at some 
level n and you were given the task of explaining to someone which choice of Si, ..., Sm you used. One way to do this 
would be to simply list the m different values of s. This would require communicating log2(-D™) bits. We instead 
show how to uniquely specify the choices of si, ..., Sm in a different way, by specifying most of the choices of si, .., .Sm 
by describing which cancellations were used. For small n, this will allow one to communicate the specific choice of 
si, .., Sm in much shorter way, thus implying that that there are only a few choices of si, ..., Sm which produce the 
desired term terminating at level n. We now put this idea into practice. 

On a given iteration of the Schwinger-Dyson equations, we go from a product of c traces to a product of c + 1, 
c — 1, or c — 2 traces. As in [191, we keep track of how the matrices move under this iteration process using a 
function /„((^,i)) from pairs of integers to pairs of integers. We say that the matrix U{si^i) in the given product of 
traces, LiL2...Lc, is in position {l,i). Let us consider the case of a term on the first line, where c increases by one. 
Then, for any given j in the sum on the first line, we say that the matrix in position (l,i), for z < j on the n + I*'* 
iteration corresponds to the matrix in position (l,i) on the n'^ iteration, and so /„((l,i)) = (1,*), while the matrix 
in position (2, i) on the n + V*^ iteration corresponds to the matrix in position (1, i + j — 1) on the n"^ iteration, so 
/„((1, i + j — I)) — (2, i). The matrix in position {I, i), for 2 < / < A; + 1 on the n + 1'^' iteration corresponds to 
the matrix {I — l,i) on the n}^ iteration, so /„(Z — l,i) — {l,i)- We follow a similar procedure for the other lines of 
Eq. (j48[) and if there are cancellations, we keep track of how the matrix moves under the cancellations. 

We then keep track of which matrix after n iterations corresponds to a given matrix before any iterations, by 
defining Fn{{l,i)) — /n(/n-i(---/i((^i*)) for / — l,2,...,2fc. Let us say that the matrix at position {l,i) is "trivially 
moved" under the n}^ iteration of the Schwinger-Dyson equations if if it is not in either position (1,1) or position 
(1, j) using a term on the first or second line, or in either position (1, 1) or position (l,j) using a term from the third 
or fourth line. If a matrix is not trivially moved, and the matrix is not in position (1, 1), then the Schwinger-Dyson 
equations imply a relation between si^i and si.i. 

A given term in Eq. (|48p arises from a given choice of {l,j)'. for a term on the first or second line let us say I = 1. 
Let (1,1) = Fn{lQ,JQ) and let {l,j) — F„(/q,Jq). If a matrix is not trivially moved under on the n^^ iteration then 
there are two cases: (1) either Iq < k and I'q < k or Iq > k and Iq > k. That is, either both matrices appeared in one 
of the first k traces, which are traces of products of conjugates of unitaries, or both matrices appeared in one of the 
last k traces, which are traces of unitaries. Or, case (2); Iq < k and l',^ > k or Iq > k and l^ < k. That is, one matrix 
was in one of the first k traces and the other was in one of the last k traces. We then break the first case into two 
sub-cases: (a), jo = j'q or (b), jo 7^ jg. We also break the second case into two sub-cases: (a), jo = "^1 + 1 ~ Jo '^^ (^)j 
jo 7^ mi -|- 1 — Jo- In case la both matrices are unitary matrices U{si,jg) or both are C^(si.jo)^ and in case 2a, one 
matrix is U{sijg) and the other is [/{sij^^Y . In case lb, we know that sij^ = Sij' for jo ^ j'q while in case 26 we 
know that sij„ = Siji + D/2 for jo 7^ Jq- Thus, in case 16 or 26 the term in the Schwinger-Dyson equation implies 
some constraint about the choice of sij. To illustrate these different cases, consider the example ([551) : the first term 
on the right-hand side of the top line is an example of case 2a, while the second term on the same line is an example 
of case la. 

Consider a given j; if on some iteration and for some / the matrix which was originally in position (/, j) is not trivially 
moved and we have case 16 or 26, then we can identify some k such that either sij = si^k or Sij = Si^k + D/2. Let 
us write k = t{j) in both cases, for some function t{j). We define a term to have a "complete rung cancellation of 
matrix j" if it is not possible to identify such a k for the given j. We claim that the sum of all terms with a complete 
rung cancellation of matrix i is equal to fc! so long a.s k < N. To show this, consider the product of traces 

tr(L/(s™ + D/2)...U{s^+i + D/2)X^U{s,^i + D/2)...U{si + D/2))hT{U{si)...U{s^-^)XU{s,+i)...U{smt , (59) 

where X is some arbitrary unitary matrix. Averaging this trace over all unitary matrices U{s) and over all unitary 
matrices X with the Haar measure, we find that the trace is equal to fc!: this can be established by applying Eq. (P5|) 
to this trace, and always cyclically permuting the trace so that X is in the first position. This calculation is very 
similar to the example calculation (j57[) above. However, applying the Schwinger-Dyson equations to the trace (j59[) 
without first applying the cyclic permutation generates precisely the sum of terms mentioned above, those in which 
there is a complete rung cancellation of matrix i. Thus, this sum of terms equals fc!. We further claim that for any 
given ?i, i2, ..., id, the sum of all terms with complete rung cancellations of matrices 11,12, ■■■id is equal to fc!, as may be 
shown by considering a trace in which matrices U{si^), U{si^), ... are replaced by Xi, X2, ..., and the trace is averaged 
over the different Xi, X2, .... Then, using the inclusion-exclusion principle, the sum of terms in which for no i is there 
a complete rung cancellation of matrix i is equal to the sum of all terms minus fc!. So, we now focus on the sum of 
terms with no complete rung cancellations, which we define to be i?Q ^(si, ..., s™); if a given choice of si, ..., s™ gives 
rise to a term which terminates at level n with no complete rung cancellations, then it is possible to identify a t(j) 
for each i. 

We now follow the same approach as in [l9[ to bound the number of choices of si, ..., s„o which can produce a term 
which terminates at a level n with no complete rung cancellations. Given the sequence of choices of terms on the 
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right-hand side of the Schwinger-Dyson equation (|48)) , as well as knowledge of which cancellations occurred at each 
iteration, we know the function r(j), and given this function r(i) there are now only at most [D /{D — 1)]{D — l)'"i/2 
possible values of si.i, ..., Sj^ ^o. Thus, the total number of choices of si,...,StoO which can produce a term which 
terminates at level n is bounded by the number of possible choices of terms and cancellations in the Schwinger-Dyson 
equation (|15|) at each of the n iterations multiplied by [D/{D — 1)]{D — l)'"i/2. At each iteration of the Schwinger- 
Dyson equations, we make a particular choice of l,j at each level, which requires specifying one particular matrix out 
of all the matrices on the right-hand side; there are at most 2mik — 1 matrices on the right-hand side, so there are 
at most 2mik — 1 choices (in [19], the slightly worse bound {2mik — 1)^ was found; we tighten the bound here). At 
each such iteration of the Schwinger-Dyson equations, there may be cancellations in two different traces if the term 
came from the second line of Eq. (j48p . with at most mi cancellations in each trace, or cancellations in two different 
places of a single trace, if the term came from the fourth line of Eq. (|48ll . with at most mi cancellations in each place. 
Let us call the number of cancellations ci,C2 with < ci < rrii and < C2 < mi. Then, by specifying l,j,ci,C2 for 
each iteration, we succeed in fully specifying how the matrices move under the n iterations of the Schwinger-Dyson 
equation; this requires specifying n numbers ranging from 1...2km,i — 1, and 2n numbers ranging from 0...m,i. 
Thus, there are at most 

[D/{D - 1)]{D - l)"?/2(2/fcm° - l)"(m° + 1)^" < [D/{D - 1)]{D - l)"?/2(2A:m°)3" (60) 

choices of si, ..., s„o which can produce a term which terminates at level n. Using Eq. (|50p . the number of choices of 
si, ..., Sm which can produce a term which terminates at level n is at most 

J2{D- l)™/22™(2A:mO)3" < {D - l)"'/^2" ^^^"^ + ^^'"^\ (61) 

m1=0 

For any si, ..., Sm, we define riminisi, ..., s,„) to be the smallest level at which a term terminates with no complete 
rung cancellations. The sum of terms with m^ = 0, which is the same as the sum of terms with rimin = 0, is bounded 
by 

N^''D-'^{D^1)"'/^2"' = N^\'S. (62) 

Thus, we re-write the sum in Eq. (|44p as 

oo D D D 

ra 

Ad. 



Ei,k<k\ + N^^N^Xii(^)"^Y.Yl E - E ^.w„(.i....,.™).n^o,fc(si,-,Sm)- (63) 



n— fcSl — lS2 — 1 Sj-n — 1 

Therefore, for any si, ..., Sm with n^nin > 0, 



E',^,isi,...,srn) < J2 N'^'^--^/'{2km-ir (64) 

^ ..2fe/3 [^''^'(2fcm~l)]"— 

l-7V-2/3(2/cTO- 1) 



From Eqs. (|61l63l64p . 



Ei.<k^ + X-Jn- + A.-/3 V i^km^ir-HN-^'\2krn-l)r. 

■ "\ ^ 3n-fl l-iV-2/3(2m-l J ^ ' 



3n-fl 1 -iV-2/3(2m- 1) 

< 1 + y^lN'' + iV2'=/3 y ^^ ^^""ti -{N-''l\2km+\)X\. 

^\ ^(3n-M)[l-iV-2/3(2fcm-l)]^ ^ ^M 

We then pick m = (l/4fc)iVi/6, so that N-^/^{2km + l)^ < 1/2 and 

|A| < {EiM-iy^"'<N^^"'XHil + Oil))^/"' (66) 

= AH(l + 0(log(iV)fciV-i/6). 

Using Markov's inequality, the probability that |A| is greater than c(l + 0(/clog(A^)A^^^/^)A//(D), for any c > 1, is 
bounded by c-^i/^'^)^'^'. | 
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IV. DISCUSSION 

We have introduced quantum and classical tensor product expanders. These provide a way to approximate t- 
designs by acting many times with a small number of unitaries. An important open question is whether efficient 
implementations of these tensor product expanders exist. 
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APPENDIX A: PROOF OF LEMMA [1] 

First, we reduce to the case when the matrices are 2x2 with 11 = |1)(1| and X is diagonal. Express ||pX+(l— p)y| 
as the maximum of {ip\pX + (1 — p)Y\tp) over all unit vectors \ip). Write \ip) as \ip) = cos(6')|-0i) + sm{9)\ip2) , where 
< 9 < 7r/2 and \tpi), IV'2) are normalized vectors such that nj-!/)!) = \tpi) and (/ — n)|^2) = |'02)- Our conditions 
on X imply that {ip\X\^) = cos2(6') + {ip2\X\^p2) sin^ie) and that \{^2\X\^p2)\ < 1 - ex- Next, for i,j = 1,2 
define Yij — {ilji\Y\^j). Since |lF|j < 1, we also have that || X^i ,=1 ^ijK)OIII — 1- We can now replace Y with 
EL=i ^^jK)01 and X with |1)(1| + (^21^1^2) |2)(2|. 

Now suppose that |(V'|^|'0)l > 1 ~ exey/12. Using our bound on |(-!/'2 1^^^1-02)1, we obtain 

1 - ^ < cos2(0) + sin2(0)(l -ex) = l- sin^e)ex, 
implying that sm^{9) < ey/12. We will show that this yields an upper bound on (^\Y\tp). 



Since ||y|l < 1, we have 1^1,2!, 1^2,1! < y ^ " l^i il- ^^"^^ 

MYM < cos2(0)|Yi,i| +sin(0)cos(0)(|yi,2| + 1^2,1!) + sin2(0)|y2,2| (Al) 



< cos(0)|yi,i| +sin(0)2^1 - |yi,i|2 + 1|. (A2) 

If 9 were not constrained then the first two terms of (|A2[) would be maximized by taking to be = 
arctan(2y^l - |yi,ip/|Yi4|) > arctan(2^2ey - e^/(l - ey)) > arctan(2V2ey)- Using sin^(arctan(z)) ::= z^/{l + z^), 
we have sin^(^) > 86^/(1 + Sey) > eY/2. Since 9 is constrained to lie in [0, arcsin(-\/ey/12)], it cannot equal 9. Thus 
maximizing (|A2p will require setting 9 to one of the endpoints of the allowed region. In particular, the maximum 
value of (|A2p occurs when sin^(6') = ey/12. A similar argument proves that setting lYi^i] = 1 — ey maximizes (|A2p 
as well. Now we calculate 

MYM < (1 - er) + 2^^2eY-el + ^2 - ^ ' W ' \ll ' T2) '"^ - ^ ' To ^^^^ 

We have shown that for any tp, either {ip\X\ip) < 1 — ejfey/12 or {ip\Y\ilj) < 1 — ey/10. We now use the triangle 
inequality to bound 

{ij\pX + il-p)Y\ij) < max(p(l - ^) + {1 - p),p + (I - p) (l - ^)) 

< 1 - Y^min(pex,l-p)- 

Since this bound applies for all normalized \ip), it must also upper-bound |lpAr+ (1 — p)F||. Thus we obtain pO|) . The 
remaining steps of the Lemma are direct calculations. | 



